Evaluation of architectural support for speech codecs application in large-scale parallel machines

نویسنده

  • Naeem Zafar Azeemi
چکیده

Next generation multimedia mobile phones that use the high bandwidth 3G cellular radio network consume more power. Multimedia algorithms such as speech, video transcodecs have very large instruction foot prints and consequently stalled due to instruction cache misses. The conflicts in on-chip caches contribute a large fraction of the CPU cycle penalty and hence increase in power consumption. Many commercial tools are developed to minimize such cache misses by adequately placing the frequently called procedures in an application. Followed by profile extraction, these tools use cache line coloring and post compilation techniques for cache hit optimization. The major impediment in the optimal performance of such tools is their static layout profile, which does not consider the dynamic behavior of the application. We propose a methodology called DCP (dynamic code placement) for positioning code at run time for good instruction cache performance and have implemented in high end processors. The dynamic application profile is completely transparent to the developer’s code. This technique optimizes the code footprint in memory layout of a program. It improves i-cache mapping to increase the number of cache hits and eventually reduce the CPU stalls. Our optimization is powered with static as well as detail run time profile information that extracts the relevant, temporal behavior of the applications. Moreover, while mapping code in instruction cache, the effect of inter-procedural code positioning is also considered. Improvement over the Pettis and Hansen approach (PH) is also shown in results. Though majority of multimedia applications can be optimized by our framework, application dominated with the function pointers do not work correctly. The technique incurs low overheads and enhances the cache hits architecture correlation. For a range of applications we show that instruction miss rates can be reduced by 19-68%. Using a simple model this corresponds to execution time reduction (23-85%), increase in parallelism (4-53%). Keywords—Wireless applications, embedded systems, low energy, cache optimization

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Heuristic approach to solve hybrid flow shop scheduling problem with unrelated parallel machines

In hybrid flow shop scheduling problem (HFS) with unrelated parallel machines, a set of n jobs are processed on k machines. A mixed integer linear programming (MILP) model for the HFS scheduling problems with unrelated parallel machines has been proposed to minimize the maximum completion time (makespan). Since the problem is shown to be NP-complete, it is necessary to use heuristic methods to ...

متن کامل

A comparison of algorithms for minimizing the sum of earliness and tardiness in hybrid flow-shop scheduling problem with unrelated parallel machines and sequence-dependent setup times

In this paper, the flow-shop scheduling problem with unrelated parallel machines at each stage as well as sequence-dependent setup times under minimization of the sum of earliness and tardiness are studied. The processing times, setup times and due-dates are known in advance. To solve the problem, we introduce a hybrid memetic algorithm as well as a particle swarm optimization algorithm combine...

متن کامل

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...

متن کامل

Solving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs

Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...

متن کامل

A New Lower Bound for Flexible Flow Shop Problem with Unrelated Parallel Machines

Flexible flow shop scheduling problem (FFS) with unrelated parallel machines contains sequencing in flow shop where, at any stage, there exists one or more processors. The objective consists of minimizing the maximum completion time. Because of NP-completeness of FFS problem, it is necessary to use heuristics method to address problems of moderate to large scale problem. Therefore, for assessme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007